22 research outputs found
Overcoming Exploration in Reinforcement Learning with Demonstrations
Exploration in environments with sparse rewards has been a persistent problem
in reinforcement learning (RL). Many tasks are natural to specify with a sparse
reward, and manually shaping a reward function can result in suboptimal
performance. However, finding a non-zero reward is exponentially more difficult
with increasing task horizon or action dimensionality. This puts many
real-world tasks out of practical reach of RL methods. In this work, we use
demonstrations to overcome the exploration problem and successfully learn to
perform long-horizon, multi-step robotics tasks with continuous control such as
stacking blocks with a robot arm. Our method, which builds on top of Deep
Deterministic Policy Gradients and Hindsight Experience Replay, provides an
order of magnitude of speedup over RL on simulated robotics tasks. It is simple
to implement and makes only the additional assumption that we can collect a
small set of demonstrations. Furthermore, our method is able to solve tasks not
solvable by either RL or behavior cloning alone, and often ends up
outperforming the demonstrator policy.Comment: 8 pages, ICRA 201
Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space
General-purpose robots require diverse repertoires of behaviors to complete
challenging tasks in real-world unstructured environments. To address this
issue, goal-conditioned reinforcement learning aims to acquire policies that
can reach configurable goals for a wide range of tasks on command. However,
such goal-conditioned policies are notoriously difficult and time-consuming to
train from scratch. In this paper, we propose Planning to Practice (PTP), a
method that makes it practical to train goal-conditioned policies for
long-horizon tasks that require multiple distinct types of interactions to
solve. Our approach is based on two key ideas. First, we decompose the
goal-reaching problem hierarchically, with a high-level planner that sets
intermediate subgoals using conditional subgoal generators in the latent space
for a low-level model-free policy. Second, we propose a hybrid approach which
first pre-trains both the conditional subgoal generator and the policy on
previously collected data through offline reinforcement learning, and then
fine-tunes the policy via online exploration. This fine-tuning process is
itself facilitated by the planned subgoals, which breaks down the original
target task into short-horizon goal-reaching tasks that are significantly
easier to learn. We conduct experiments in both the simulation and real world,
in which the policy is pre-trained on demonstrations of short primitive
behaviors and fine-tuned for temporally extended tasks that are unseen in the
offline data. Our experimental results show that PTP can generate feasible
sequences of subgoals that enable the policy to efficiently solve the target
tasks
Combining Self-Supervised Learning and Imitation for Vision-Based Rope Manipulation
Manipulation of deformable objects, such as ropes and cloth, is an important
but challenging problem in robotics. We present a learning-based system where a
robot takes as input a sequence of images of a human manipulating a rope from
an initial to goal configuration, and outputs a sequence of actions that can
reproduce the human demonstration, using only monocular images as input. To
perform this task, the robot learns a pixel-level inverse dynamics model of
rope manipulation directly from images in a self-supervised manner, using about
60K interactions with the rope collected autonomously by the robot. The human
demonstration provides a high-level plan of what to do and the low-level
inverse model is used to execute the plan. We show that by combining the high
and low-level plans, the robot can successfully manipulate a rope into a
variety of target shapes using only a sequence of human-provided images for
direction.Comment: 8 pages, accepted to International Conference on Robotics and
Automation (ICRA) 201
Learning on the Job: Self-Rewarding Offline-to-Online Finetuning for Industrial Insertion of Novel Connectors from Vision
Learning-based methods in robotics hold the promise of generalization, but
what can be done if a learned policy does not generalize to a new situation? In
principle, if an agent can at least evaluate its own success (i.e., with a
reward classifier that generalizes well even when the policy does not), it
could actively practice the task and finetune the policy in this situation. We
study this problem in the setting of industrial insertion tasks, such as
inserting connectors in sockets and setting screws. Existing algorithms rely on
precise localization of the connector or socket and carefully managed physical
setups, such as assembly lines, to succeed at the task. But in unstructured
environments such as homes or even some industrial settings, robots cannot rely
on precise localization and may be tasked with previously unseen connectors.
Offline reinforcement learning on a variety of connector insertion tasks is a
potential solution, but what if the robot is tasked with inserting previously
unseen connector? In such a scenario, we will still need methods that can
robustly solve such tasks with online practice. One of the main observations we
make in this work is that, with a suitable representation learning and domain
generalization approach, it can be significantly easier for the reward function
to generalize to a new but structurally similar task (e.g., inserting a new
type of connector) than for the policy. This means that a learned reward
function can be used to facilitate the finetuning of the robot's policy in
situations where the policy fails to generalize in zero shot, but the reward
function generalizes successfully. We show that such an approach can be
instantiated in the real world, pretrained on 50 different connectors, and
successfully finetuned to new connectors via the learned reward function.
Videos can be viewed at https://sites.google.com/view/learningonthejobComment: 10 page
Generalization with Lossy Affordances: Leveraging Broad Offline Data for Learning Visuomotor Tasks
The utilization of broad datasets has proven to be crucial for generalization
for a wide range of fields. However, how to effectively make use of diverse
multi-task data for novel downstream tasks still remains a grand challenge in
robotics. To tackle this challenge, we introduce a framework that acquires
goal-conditioned policies for unseen temporally extended tasks via offline
reinforcement learning on broad data, in combination with online fine-tuning
guided by subgoals in learned lossy representation space. When faced with a
novel task goal, the framework uses an affordance model to plan a sequence of
lossy representations as subgoals that decomposes the original task into easier
problems. Learned from the broad data, the lossy representation emphasizes
task-relevant information about states and goals while abstracting away
redundant contexts that hinder generalization. It thus enables subgoal planning
for unseen tasks, provides a compact input to the policy, and facilitates
reward shaping during fine-tuning. We show that our framework can be
pre-trained on large-scale datasets of robot experiences from prior work and
efficiently fine-tuned for novel tasks, entirely from visual inputs without any
manual reward engineering.Comment: CoRL 202